skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Sun, Yizhou"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available October 7, 2026
  2. Free, publicly-accessible full text available July 16, 2026
  3. Free, publicly-accessible full text available July 1, 2026
  4. Free, publicly-accessible full text available June 29, 2026
  5. Free, publicly-accessible full text available July 20, 2026
  6. Large language models (LLMs) based on transformer architecture have shown outstanding performance across numerous real-world tasks. However, the autoregressive nature of these models makes the inference process slow and costly. Speculative decoding has emerged as a promising solution, leveraging a smaller auxiliary model to draft future tokens, which are then validated simultaneously by the larger model, achieving a speed-up of 1-2x. Although speculative decoding matches the same distribution as multinomial sampling, multinomial sampling itself is prone to suboptimal outputs, where as beam sampling is widely recognized for producing higher-quality results by maintaining multiple candidate sequences at each step.This paper explores the novel integration of speculative decoding with beam sampling. However, there are four key challenges: (1) how to generate multiple sequences from the larger model's distribution given drafts sequences from the small model; (2) how to dynamically optimize the number of beams to balance efficiency and accuracy; (3) how to efficiently verify the multiple drafts in parallel; and (4) how to address the extra memory costs inherent in beam sampling.To address these challenges, we propose dynamic-width speculative beam decoding (DSBD). Specifically, we first introduce a novel draft and verification scheme that generates multiple sequences following the large model's distribution based on beam sampling trajectories from the small model. Then, we introduce an adaptive mechanism to dynamically tune the number of beams based on the context, optimizing efficiency and effectiveness. Besides, we extend tree-based parallel verification to handle multiple trees simultaneously, accelerating the verification process. Finally, we illustrate a simple modification to our algorithm to mitigate the memory overhead of beam sampling.Experimental results show that our approach achieves a 1.5-1.9x speed-up and1.8-2.5x lower energy consumption compared to beam sampling, with no loss in downstream performance. Moreover, it can produce significantly higher-quality outputs than speculative decoding, while maintaining similar time, memory, and energy costs. In summary, our method offers a more efficient and effective inference process for LLMs. 
    more » « less
    Free, publicly-accessible full text available April 11, 2026
  7. Free, publicly-accessible full text available April 24, 2026
  8. High-level synthesis (HLS) is a widely used tool in designing Field Programmable Gate Array (FPGA). HLS enables FPGA design with software programming languages by compiling the source code into an FPGA circuit. The source code includes a program (called ``kernel'') and several pragmas that instruct hardware synthesis, such as parallelization, pipeline, etc. While it is relatively easy for software developers to design the program, it heavily relies on hardware knowledge to design the pragmas, posing a big challenge for software developers. Recently, different machine learning algorithms, such as GNNs, have been proposed to automate the pragma design via performance prediction. However, when applying the trained model on new kernels, the significant domain shift often leads to unsatisfactory performance. We propose a more domain-generalizable model structure: a two-level hierarchical Mixture of Experts (MoE), that can be flexibly adapted to any GNN model. Different expert networks can learn to deal with different regions in the representation space, and they can utilize similar patterns between the old kernels and new kernels. In the low-level MoE, we apply MoE on three natural granularities of a program: node, basic block, and graph. The high-level MoE learns to aggregate the three granularities for the final decision. To stably train the hierarchical MoE, we further propose a two-stage training method. Extensive experiments verify the effectiveness of the hierarchical MoE. 
    more » « less
    Free, publicly-accessible full text available April 11, 2026
  9. Advances in data-driven design and additive manufacturing have substantially accelerated the development of truss metamaterials—three-dimensional truss networks—offering exceptional mechanical properties at a fraction of the weight of conventional solids. While existing design approaches can generate metamaterials with target linear properties, such as elasticity, they struggle to capture complex nonlinear behaviours and to incorporate geometric and manufacturing constraints—including defects—crucial for engineering applications. Here we present GraphMetaMat, an autoregressive graph-based framework capable of designing three-dimensional truss metamaterials with programmable nonlinear responses, originating from hard-to-capture physics such as buckling, frictional contact and wave propagation, along with arbitrary geometric constraints and defect tolerance. Integrating graph neural networks, physics biases, imitation learning, reinforcement learning and tree search, we show that GraphMetaMat can target stress–strain curves across four orders of magnitude and vibration transmission responses with varying attenuation gaps, unattainable by previous methods. We further demonstrate the use of GraphMetaMat for the inverse design of novel material topologies with tailorable high-energy absorption and vibration damping that outperform existing polymeric foams and phononic crystals, potentially suitable for protective equipment and electric vehicles. This work sets the stage for the automatic design of manufacturable, defect-tolerant materials with on-demand functionalities. 
    more » « less
    Free, publicly-accessible full text available July 1, 2026